Clustering Geometric Data Streams
نویسندگان
چکیده
Using recent knowledge in data stream clustering we present a modified approach to the facility location problem in the context of geometric data streams. We give insight to the existing algorithm from a less mathematical point of view, focusing on understanding and practical use, namely by computer graphics experts. We propose a modification of the original data stream k-median clustering to solve facility location which is the case when we a priori do not know the number of clusters in the input data. Like the original, the modified version is capable of processing millions of points while using rather small amount of memory. Based on our experiments with clustering geometric data we present suggestions on how to set processing parameters. We also describe how the algorithm handles various distributions of input data within the stream. These findings may be applied back to the original algorithm. CR Categories: I.5.3 [Computing Methodologies]: Pattern Recognition—Clustering; I.3.5 [ComputingMethodologies]: Computer Graphics—Computational Geometry and Object Modeling
منابع مشابه
Distributed Dual Cluster Algorithm Based on Grid for Sensor Streams
In practical applications, Wireless Sensor Networks generate massive data streams with the dual attributes in geography and optimization domain. Energy source of sensor nodes in WSN is usually limited; Data streams transmission is known to be the largest consumer of energy in WSN. Therefore, reduce the total data transmission and maximizing energy efficiency is the major challenge in WSN. In ad...
متن کاملSensitivity Sampling Over Dynamic Geometric Data Streams with Applications to k-Clustering
Sensitivity based sampling is crucial for constructing nearly-optimal coreset for k-means / median clustering. In this paper, we provide a novel data structure that enables sensitivity sampling over a dynamic data stream, where points from a high dimensional discrete Euclidean space can be either inserted or deleted. Based on this data structure, we provide a one-pass coreset construction for k...
متن کاملGeometric Data Perturbation Techniques in Privacy Preserving On Data Stream Mining
Data mining is the information technology that extracts valuable knowledge from large amounts of data. Due to the emergence of data streams as a new type of data, data stream mining has recently become a very important and popular research issue. Privacy preservation issue of data streams mining is very important issue, in this dissertation work, an approach based on Geometric data perturbation...
متن کاملMonitoring Distributed Data Streams through Node Clustering
Monitoring data streams in a distributed system is a challenging problem with profound applications. The task of feature selection (e.g., by monitoring the information gain of various features) is an example of an application that requires special techniques to avoid a very high communication overhead when performed using straightforward centralized algorithms. Motivated by recent contributions...
متن کاملGeometric Monitoring of Heterogeneous Streams (Long Version, with Proofs of the Theorems)
Interest in stream monitoring is shifting toward the distributed case. In many applications the data is high volume, dynamic, and distributed, making it infeasible to collect the distinct streams to a central node for processing. Often, the monitoring problem consists of determining whether the value of a global function, defined on the union of all streams, crossed a certain threshold. We wish...
متن کامل